Best Arm Identification for Contaminated Bandits
نویسندگان
چکیده
This paper studies active learning in the context of robust statistics. Specifically, we propose the Contaminated Best Arm Identification variant of the multi-armed bandit problem, in which every arm pull has probability ε of generating a sample from an arbitrary contamination distribution instead of the true underlying distribution. The goal is to identify the best (or approximately best) true distribution with high probability, with a secondary goal of providing guarantees on the quality of that arm’s underlying distribution. It is simple to see that in this contamination model there are no consistent estimators for statistics (e.g. median) of the underlying distribution, and that even with infinite samples, statistics can be estimated only up to some unavoidable bias. We present tight, non-asymptotic sample complexity bounds for estimating the first two robust moments (median and median absolute deviation) with high probability. We then show how to use this algorithmically for our problem by adapting Best Arm Identification algorithms from the classical multi-armed bandit literature. We give matching upper and lower bounds (up to a small logarithmic factor) on these algorithms’ sample complexities. These results suggest an inherent robustness of classical Best Arm Identification algorithms.
منابع مشابه
PAC Bandits with Risk Constraints
We study the problem of best arm identification with risk constraints within the setting of fixed confidence pure exploration bandits (PAC bandits). The goal is to stop as fast as possible, and with high confidence return an arm whose mean is -close to the best arm among those that satisfy a risk constraint, namely their α-quantile functions are larger than a threshold β. For this risk-sensitiv...
متن کاملPractical Algorithms for Best-K Identification in Multi-Armed Bandits
In the Best-K identification problem (Best-K-Arm), we are given N stochastic bandit arms with unknown reward distributions. Our goal is to identify the K arms with the largest means with high confidence, by drawing samples from the arms adaptively. This problem is motivated by various practical applications and has attracted considerable attention in the past decade. In this paper, we propose n...
متن کاملBest-Arm Identification in Linear Bandits
We study the best-arm identification problem in linear bandit, where the rewards of the arms depend linearly on an unknown parameter θ and the objective is to return the arm with the largest reward. We characterize the complexity of the problem and introduce sample allocation strategies that pull arms to identify the best arm with a fixed confidence, while minimizing the sample budget. In parti...
متن کاملOne Practical Algorithm for Both Stochastic and Adversarial Bandits
We present an algorithm for multiarmed bandits that achieves almost optimal performance in both stochastic and adversarial regimes without prior knowledge about the nature of the environment. Our algorithm is based on augmentation of the EXP3 algorithm with a new control lever in the form of exploration parameters that are tailored individually for each arm. The algorithm simultaneously applies...
متن کاملBest arm identification in multi-armed bandits with delayed feedback
We propose a generalization of the best arm identification problem in stochastic multiarmed bandits (MAB) to the setting where every pull of an arm is associated with delayed feedback. The delay in feedback increases the effective sample complexity of standard algorithms, but can be offset if we have access to partial feedback received before a pull is completed. We propose a general framework ...
متن کامل